Skip to content

Conversation

@ueshin
Copy link
Member

@ueshin ueshin commented Oct 29, 2025

What changes were proposed in this pull request?

Supports logging in Pandas/Arrow UDFs.

Why are the changes needed?

The basic logging infrastructure was introduced in #52689, and other UDF types should also support logging.

Here adding support for Pandas and Arrow UDFs.

Does this PR introduce any user-facing change?

Yes, the logging feature will be available in Pandas/Arrow UDFs.

How was this patch tested?

Added the related tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member

Thank you, @ueshin .

"spillSize" -> SQLMetrics.createSizeMetric(sparkContext, "spill size")
)

private[this] val sessionUUID = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we move this code into python runner so that it shares among subclasses?

Copy link
Member Author

@ueshin ueshin Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, no. python runner is already in executor but session is not available there. We could refactor it later, though.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 4.1.0. Thank you, @ueshin and all.

Yicong-Huang pushed a commit to Yicong-Huang/spark that referenced this pull request Oct 30, 2025
Supports logging in Pandas/Arrow UDFs.

The basic logging infrastructure was introduced in apache#52689, and other UDF types should also support logging.

Here adding support for Pandas and Arrow UDFs.

Yes, the logging feature will be available in Pandas/Arrow UDFs.

Added the related tests.

No.

Closes apache#52785 from ueshin/issues/SPARK-53976/pandas_arrow_udfs.

Authored-by: Takuya Ueshin <ueshin@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants